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Abstract- In this decade and progressing into 21st Century, NASA will have missions including 
Space Station and the Earth related Planet Sciences. To support these missions, a high degree of 
sophistication in machine automation and an increasing amount of data processing throughput 
rate are necessary. Meeting these challenges requires intelligent machines, designed to support 
the necessary automations in a remote space and hazardous environment. There are two 
approaches to designing these intelligent machines. One of these is the knowledge-based expert 
system approach, namely AI. The other is a non-rule approach based on parallel and distributed 
computing for adaptive fault-tolerances, namely Neural or Natural Intelligence (NI). The union 
of AI and NI is the solution to the problem stated above. 

The NI segment of this unit extracts features automatically by applying Cauchy 
simulated annealing [Phys. Lett. A122, p.157; Proc. IEEE, V.75, p.l5381 to a mini-max cost energy 
function. The feature discovered by NI can then be passed to the AI system for future processing, 
and vice versa. This passing increases reliability, for AI can follow the NI formulated algorithm 
exactly, and can provide the context knowledge base as the constraints of neurocomputing. Such 
integration is exemplified by the pattern recognition Human Visual Systems; tracking of gray 
scaled objects for instance. Consequently, both AI and NI can work together to solve the same 
problem by unifying into an intelligent processor. 

The mini-max cost function that solves the unknown feature can furthermore give us a top- 
down architectural design of neural networks by means of Taylor series expansion of the cost 
function. A typical mini-max cost function consists of (1) the sample variance of each class in the 
numerator, and (2) separation of the center of each class in the denominator. Thus, when the 
total cost energy is minimized, the conflicting goals of intraclass clustering and interclass 
segregation are achieved simultaneously. This Taylor expansion variable is a neuronic vector 
representation which traces along a Peano's curve. A selective space-filling capability exists 
when a more detailed spatial resolution becomes desirable at the picture where an interesting 
change occurs [IJCNN-90, D.C., p. 11-76]. 

INTRODUCTION 

Research and operations that support NASA’s missions have experienced an increasing volume of data 
that requires automated information processing, among others (e.g. Discovery shuttle between the space station 
and the earth shown in Fig. 1 Top). One necessity is the next generation smart sensors. They are needed for two 
reasons. First, they are needed to perform multisensor data auto-fusion (between thematic mapper spectral band 
imageries and high spatial resolution imageries) in order to improve the picture resolution beyond the 
geometrical corrections and proper registrations. They are also needed to extract features to identify space rocket 
boosters shown in Fig. 1 center (provided by courtesy of T Dworetzky). From left to right, these are Goddard 
(1941), V-2 (German, 1944), Redstone (1961), Atlas Centaur (1962), Delta 3920 (1982), Titan 34D (1982), Saturn V 
(1967), Ariane (European 1979), Energia (Soviet 1987), and Conestoga II (Future). Automated feature extraction 
can also be useful to update maps as well as to help manage earth’s resources. For example, an extra road through 
the palm forest was discovered by Environmental Research Institute Michigan (ERIM) in Fig. 1 (Bottom) D. 

The trend in the modern telecommunications is toward multi-media, higher-speed and increased 
intelligence (Fig. 2 (a)). Thus, another application of intelligent machines is, according to NTT Review (Vol. 1, 
No.l May 1989), a Broad-Band Integrated Service Digital Network (B-ISDN) that has been proposed and will 
probably undergo construction around 1995 (see Fig. 2 (b), used with permission). B-ISDN will have the 
capability of processing voice, images, and text, simultaneously based on neurocomputing. 
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Figure 1. (Top) NASA's Space Shuttle Discovery. (Center) Feature extraction to identify various space rocket 
boosters. (Bottom) Automatic feature extraction to update maps and to help manage earth's resources. 
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Figure 2. (a) The trend towards Multi-media Communication, (c) Mixing of voice and image spectral components 
by an associative memory. 








Broad band ISDN (B ISDN) 


i r i 



290 


ORIGINAL PAGE IS 
OF POOR QUALITY 













Associative memory can mix the voice and image spectral components by vector outer products (shown in 
DARPA’s Report as a dotted matrix array in Fig. 2 (c)). This cross correlation information can be processed at the 
data rate of about 3 Gb/s. Usually such a high data rate requires optical computing based on optical switching 
and coherent optical transmissions. However, neurocomputing's debut in telecommunication is predicted by NTT 
to be five years earlier than optical computing, despite extensive research efforts in optical computing by AT&T 
& others. 


REVIEW OF NEURAL NETWORK LEARNING ALGEBRA 

Neural network computing is a nonlinear system that satisfies 4 none-principles with the fifth none- 
principle remains to be worked out. These are: (1) a none-linear threshold logic of neurons, (2) a none-local 
associative memory, (3) a none-stationary neurodynamics, and (4) a none-convex system energy, meaning more 
than one extremum in the energy landscape. The first one is known to us 30 years ago, when the Rosenblatt's 
perceptron was proposed to be a random collection of neurons. It had been shown by Minsky and Papert to be 
insufficient for the natural intelligence, and thus giving the need to the birth to AI. These 4 none-principle can be 
approximated by (1) piecewise-linear, namely binary neurons, (2) piecewise-local, namely the rank-1 vector 
outer product, (3) piecewise stationary, namely iterative revisions, and (4) piecewise convex, namely local 
gradient descents. In these controlled approximations, these interwoven complex principles become decoupled 
and amenable to powerful computer simulations. Since then NI has been coming a long way, there remains a 
missing fifth none-principle. Such a none-programming learning principle has been claimed by some, but the 
hidden teachers /programmers remain to be unraveled to most of us for pedagogical reasons. This is the state of 
the art of ncurocomputing theory. 

Neurocomputing learning algebra are based on the variants of Hebbian ideas. Giving two random inputs 
of two neuron firing rates about 100 Hz, uj uj, there are limited algebraic structures that one can manipulate with 
to extract meaningful information. If the change of synaptic weight at the ith and the jth interconnection, AWjj, 
should be related to the inputs as follows: 

• Correlation Learning: AWjj * ui uj 

(maximum information-exchange rule between a pair of random firing rates) 

• Gradient Learning : AWij ~ (D | - v i ) uj 

(Error correction by a pre-set output goal D [ that decides when the change of actual output vj 
stops: the delta rule) 

• Competitive Learning AWjj * ui ( uj - Wjj) 

(any change must balance against the old cluster establishment wij) 

• Differential Learning AWjj « (dui/dt) (duj/dt) 

(Only time rate changes, derived by Taylor series expansion of ui(t), matters) 

REVIEW OF NEURAL NETWORK ARCHITECTURES 

Neural network architectures are important for parallel and distributed computing. There are: one layer 
of Hopfield's Associative Memory (AM), two layers of Grossberg's Adaptive Resonance Theory (ART), and three 
layers of Rumelhart's Back Error Propagation (BEP), as shown schematically in Fig. 3, Learning Algorithm- 
Architecture as follows. 

• In the left hand column of Fig. 3, similar inputs X[ are mapped into similar outputs Zfc in a feature 
space. Such a (hetero-associative) matrix memory is formed by the vector outer product forming a matrix denoted 
as IZk XjTl, where the superscript T stands for the transpose of the column vector X (indexed with the component 
i) and the column vector becomes a row vector. Matrix memory is a static version of Hopfield neural networks, 
because of the fixed point coding between the input and the output requires no learning. By a fixed point coding we 
mean that "write-by-outer-product" and "read-by-inner-product" and using the matrix-vector operation without 
iterations. 

• In the middle column of Fig. 3, when the similar inputs produce the surprising outputs, an extra layer is 
introduced to interpolate these abnormal results by means of supervised training. The difference I D - Z I from the 
output Z with respect to the desired output D is considered to be the error propagates backward by means of a 
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local gradient descent methodology. The system can have the potential for the generalization. There are several 
theories about the size of the so-called hidden layer and the ability to do the abstraction ( with more neurons 
than that of input nodes) or the generalization (with a fewer neurons). The degree of freedom must match the 
number of sample classes to be classified based on the orthogonal feature space min-max concept described in the 
following section. In such a quasi-orthogonal storage, this rule seems to be reasonable in assigning credit-or- 
blame. 

• When the desired output D is not yet known, Grossberg model of Adaptive Resonance Theory (ART) 
becomes handy. It might be thought as to flip down the unknown output layer in order to compare the unknown 
input directly as shown in the righthand column of Fig. 3. The master has its own top-down wires T jk (shown by 
dotted lines), while the donkey has its own bottom-up wires b jj. In order to carry out automatically the 
clustering technique by following the leader, the top layer master puts his feet into donkey's input xj to test his 
own normalized prediction | S < x j I T jk I x k> I / I S < x j I x k> I with a predetermined parameter, called the vigilance 
parameter between 0.5 to 0.9. Therefore, the difference between the traditional control theory with the negative 
feedback and the neural network is that both the incentive/carrot and the punishment/stick are used in the 
biological model having both the excitation and the inhibition exerted at different parts of the self-organized 
system. 
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Figure 3. Review of Learning Algorithm-Architecture. 

An interesting taxonomy dilemma about counting of layers is due to the ambiguity of counting about layers 
of neurons or about layers of interconnects. The single-layer Hopfield architecture seems to have two layers of 
neurons, with respect to the three layers of Rumelhart architecture. On the other hand, the Hopfield 
architecture is considered to be a single layer on a VLSI design. This dilemma may be resolved by asking: What is 
more important in counting, the layer of interconnect synaptic weights, or the layer of neurons ? Since the 
synaptic weights contain the important memory information, then Hopfield's network should be counted as one 
layer. 
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Designs of Energy Cost Functions in A Neuronic Vectorial Representation 

An important question for practical applications is how to speed up the training process and to insure a 
fast convergence of weight adjustment? We have suggested a general procedure of Taylor series expansion of the 
clustering-declustering mini-max energy to estimate the synaptic weights. Here, we extend the procedure by a 
self-consistently variational technique to make the truncated higher order terms of the Taylor series negligible. 

A top-town design of a hard-wired neural network algorithm has been initiated by Hopfield, et al, for 
constrained optimizations. We consider a supervised top-down design goes beyond Hopfield's attempt. The 
minimum clustering of the alike and the maximum declustering of the disalike seems to be two contradicting 
goals. A tradeoff can be mathematically constructed by the linear combination of those pairs alike in the 
numerator and the pairs of disalikes in the denominator of a mini-max energy formalism (schematically shown in 
the cost energy expression of Fig.4). 

Top Down Design of Hard-Wired Neural Networks 
Mini - Max Energ y Principle 
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Figure 4. A top-down design of Neural Networks. 


Let us consider some application of pattern classifications. The class of physically different objects (o, O, 
p, P/ q# Q} need to be cleverly pre-processing by a smart sensor mimicking our eyes or by ourselves and then endow 
our wisdom about how we classify the set with a functional mapping into a feature space { oO/j), O(Vj), p(Vj), 
P(Vi), q(Vi), Q(Vj)} spanned by a sufficient set of neurons Vi mimicking the human visual system of the brain 
(Szu & Scheff 1989). The first term of the energy in the denominator is similar to the Coulomb energy of 
repulsive electric charges (reduced Coulomb energy model of Cooper, et al., and Lorentz forces of Sayeh, et al). 
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and the second term in the numerator is similar to the least square method(when in an arbitrary power becomes 
Kohonen’s kth norm clustering method). 

While the first order derivative is reserved for the aforementioned neurodynamics equations, the second 
order derivative when it is evaluated at the equilibrium value: Vi^V^ 0 ^ Vj=v(°)j becomes the Taylor's 
coefficient 

T (o). j = 02 E /avidVj) i V i=v (o) i,vj=v (o) j 

Then, the Hopfield-like hard-wired interconnect T^j j become soft-wired Tj jby means of the Hebbian learning 
that make the cubic order negligible. 

T i,j =T(o) i,j + e5V i 5V j 

T i,j,k 1 Vi=V (o) i+8Vi,Vj=V (o) j+5Vj <<: l(0) i,j 
Similarly, the procedure can be analogously extended to the three layers: 

Ti,j,k=T(°>i >j/k + ef(5V i 5V j 5V k) 

which makes the next fourth order derivative negligible. The case of hidden layer architecture means that 
Tj j k is a block-diagonalized tensor of which the input ith layer can not communicate with the kth layer output 

layer without going through the jth hidden layer of neurons. 

We can show that a single layer of a fully interconnected Hopfield network of five neurons of 25 
interconnects can be reduced by the use-it-or-lose-it principle to 6 interconnects. Without actual physical 
rearrangement, it becomes topologically equivalent to a three layer of Minsky nets by clamping 2 input neurons 
and 1 output neuron to be trained repeatedly with the "exclusive OR" input-output relationship. This illustrates 
the second computing principle that can not only be used to determine the learning algorithm but also used to 
derive the neural network architectural change consistently. 

Experimental aspect of the unified learning theory has been demonstrated by NTT scientists using several 
life neurons, extracted from the hippocampus of chicken brain. In delayed video recording they have shown that 
neuronic hair fibers Tij grow for seeking out the nutrition and other neurons, in a competitive learning fashion. 
The winning hair fiber has grown fatter into a mature axonic interconnect, while the other loser shrinks off, on an 
electronic chip substrate covered with the life sustaining liquid. The present unified theory is possible to explain 
such a growing synapse because of the extended McCulloch-Pitts neuron model with two transfer functions for two 
independent degrees of freedom, namely the sigmoidal firing rate transfer function and the synaptical transfer 
function. Such a model has been coined with a name of the hairy neuron neural networks (Szu, 1989). 

NEUROCOMPUTING IS MORE THAN PARALLEL COMPUTING 

The famous von Neumann bottleneck, 10^ operations per second (ops), for a sequential computer has been 
circumvented by parallel computing models which require lock steps among multiple processors controlled by a 
precision clock cycle that has unfortunately created the second bottleneck, 10^ ops' (that I wish to call) the five 
W bottleneck, namely "who should do what, when, where and how” bottleneck, due to the necessary trade off 
between the actual execution and the communication for timing and assigning jobs among multiple pipe lines. 
Therefore, the following asynchronous neurocomputers are fundamentally important and can make possible a 
cheaper VLSI fabrication of neurocomputers. Although the fabrication advantages without the demand of 
timing accuracy is conceivable, but without neuronic processor timing the dynamics about when and how the 
collective computing is finished requires mathematical insurance. Thus, we will prove three theorems for three 
neurocomputing learning mechanisms with hard-wired, soft-wired, and brittle-wired interconnects. Our purpose 
is to point out the possibility of allowing the system to determine its own topological structure, by means of a 
dynamically reconfigurable hairy neuron model described below. In order to minimize the overall energy, 
dynamically reconnected neurofilaments Tij (located at the protein-mediated output axons) can play an equally 
important role as the synaptic junction Wjk weight adjustments (located at the ion-mediated input dendrite 
tree). The extra degree of freedom of the hairy neurons is the synaptic transfer function having a nonnegative 
slope 

Tij=f(Wik); (dTij/dWik) > 0 

while the McCulloch-Pitts neuron model has one internal degree of freedom prescribing the firing rate transfer 
squash function 
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Vi = g(Ui); (dVj/dUj)>0 

The following three convergence theorems all depend on the mathematical truth that (d (any real quantity)/d 
ti) 2 > 0 with respect to any time axis: 

tj =tj (o) + Ej t, 

where the information arrival time has an arbitrary initial time tj^ and a positive time scale factor £j > 0 
with respect to a collective or universal time axis t. 


(1) Hopfield-like Asynchronous Computing by Hard-Wired Nets Ei(V) 


We consider first a system of a hard-wired neural networks. We assume a network activity energy Ej(V) 
in terms of the output firing rate vector V with the components Vj whose i index runs from one neuron to a million, 
e.g. the mega-Cray. We can use either Ej(Vj) or Ej(V). The input firing rate to the ith neuron is wired according 
to the McCulloch-Pitts model with the bias 0j: 

Uj =Ej WjjVj + ©j. ( i ) 

The synaptic weight Wjj at the jth junction of the ith neuron input dendrite has a physical gap, analogous to the 
spark plug, through which the ion-mediated firing rates from other outputs Vj are collected. Then, Hebbian 
learning would mean the changing of the spark plug gap for tuning up the car engine firing rates. Due to the 
diffusion of discrete ions through those synaptic junctions, the firing rate fluctuates like a discrete time series at 
the molecular time scale t in the order of one millisecond. The information flow with a reduced fluctuation of the 
neurotransmitters plays an important annealing role for the global convergence of the neurodynamics. 

Each neuron can be operated at its own time axis: 

tj =tj(°) + ei t, (2) 


where the information arrival time has an arbitrary initial time tj(°) and a positive time scale factor Ej > 0 
with respect to a collective or universal time axis t. This asynchronicity is essential to account for different 
information flow rates due to the biological inhomogeneity at neuronic level. 

The total input is instantaneously mapped to the output by a nonlinear transfer function g 

V i = g<Ui) (3 a) 

A squash function known in biology as a sigmoidal function is often used 

g(x)=l/( 1 + exp(-x)) 
for the simplicity of the analytic slope: 

dg/ dx = g(l-g) > 0, (3c) 

which vanishes at g=0 when the neuronic decision means no, or at g=l meaning yes. This set of Eq. (3a, b, c) 
describes an analog model of McCulloch-Pitts neurons. The original proof of convergence by Hopfield' uses 
explicitly a quadratic energy expression among neurons for easy analog VLSI implementation. An independent 
proof has been given by Cohen and Grossberg that does not require the symmetry property of interconnects. 

Each fine grained processor has been modeled in this paper by a different propagation speed governed bv 
the first order equation: J 

(dUj/dtj) = - OEi(V) /c)Vj), (4) 

driven by a local energy gradient. 

The collective answer should emerge at (dE/dt)=0 when the seemingly random computing without the 
lock-clock synchronizations. With respect to the collective time, the following macroscopically irreversibility: 
(dE/dt) < 0 will be guaranteed. 


Theorem I: Asynchronous Convergence based on (dEi(V)/dt) < 0. 


If the neural network energy E](V) depends only on the set V of all output firing rates Vj and if and only 
if an arbitrary transfer function, Vj=g(Uj) has a non-negative slope: (dVj/dUj) > 0, then the change of each 
neuron input Uj governed by its own time axis, through the first order dynamics: (dUj/dtj) = - OEi(V)/3Vj) 
where tj =tj(°) + Ej t with Ej > 0, will guarantee the monotonic convergence (dEj/dt) < 0. 

Proof: The differential increment of in time must maintain the direction of the time flow, Eq. (3b) 
implying a positive characteristic factor, ' 
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dti = ei dt ,or, (dti/dt)= ei > 0 , 

The enerey-gradient is so-to-speak the force upon the axonic output that changes the firing rate of the total 
dendritic input Eq. (1). Nonetheless, the global energy converges with respect to the collective or universal time 

(dEi(Vi)/dt) = EiOEi/aVi)(dVi/dti)(dti/dt) J5a> 

= - Li ei (dUi/dti) (dVi/dti) (5b) 

= - L t Ei (dUi/dtj) 2 (dVi/dUi) 

Eq. (5a) is obtained by the chain rule of differentiation; in Eq. (5b), use is made of Newtonian Eq. (4) to eliminate 
the the energy slope ; Eq. (5c) is merely the identity (dVi/dti)=(dVi/dUi)(dUi/dti) used to produce the secon 
power of (dUi/dtj) in Eq. (5c). The last inequality Eq. (5d) is based on the mathematical truth that the square of 

arbitrary real number 

(dUi/dtj)2= (Real Numbers) 2 > 0 

must be nonneeative in any time scale. f .1 

In the general convergence proof for arbitrary time axis ti with ej > 0, we require no detail structure of t 

energy function, other than once differentiable. Thus, we have indeed verified the intuition that nothing changes 
(dEi /dt) = 0 at the moment of convergence. This theorem may be called the first asynchronous neurocomputing 
principle that predicts the macroscopic irreversibility (dEi/dt) < 0 from the microscopic reversible ut time- 
asynchronous neurodynamics Eq. (4). The irreversibility is due to the necessary and sufficient condition Eq. ) o 
the nonlinear transfer function g (that is equivalent to the stosszahl Ansatz of the binary collision transfer 
function in the Boltzmann Transport Equation). Although the proof similar to the Lyaponov theorem in the 
standard control theory, the learning mechanism in bio-control theory has been left unanswered. 

(2) Rumelhart-like Weight- Adjustment Learning: Soft-Wired E2(Wij) 

Due to the biological inhomogeneity, the energy gradient descent methodology may be slightly 
generalized to a time-asynchronous learning algorithm that each neuron could have its own time axis 
(dWij/dtj) = - OE 2 (Wij)/3Wij), 

Rumelhart "rtaL has applied Eq. (6) to a feed forward and fixed layer architecture, within the synchronized 
layer of neurons: ej= 1. A slightly generalized convergence proof of time-asynchronous neurocomputing is given 

as follows: 


Theorem II: Synaptic Adjustment Convergence: 


(dE2(Wij)/dt) = li 0E2/3Wij) (dWij/dt) 
= - Li (dWij/dti) (dWij/dti)(dti/dt) 

= - Li ei (dWij/dti) 2 
<0 


(7a) 

(7b) 

(7c) 

(7d) 


The adjustment of the synaptic weights Wij can be derived implicitly 
desired output D from the actual output V, when a given input U is fed 
methodology is known as the backward-error-propagation resulting in a delta 
the blame to other layered neurons behind them. To illustrate both energy 


in terms of the square error of the 
into the layered network. Such a 
learning rule to assign the credit or 
functions Ei(Vj) and E 2 (Wij), we 


assume 


E2(Vi(Wij)) = (1/2) L i ( Di - Vi) 2 ] 

to be the square error of the desired response Di from the actual output Vi, which, in terms of the analytical 
transfer function g of the input Ui=Lj Wij V 'j + ©i Eq. (D, are the upward link synaptic weights. We denote the 
set of (input, actual output, desired output) respectively as (Uj , Vj , D[). If there is no error. (V, - Dj) 0, no 
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learning takes place. The upward link weights Wij are adjusted to reduce the difference, by multiplying the 
time-dependent factor ej to both hand sides of Eq. (6a). 

£i(dWij/dti) = AWij= -eiOE/aWij) 

= -ei{(dE/dVi)(dVi/dUi)}0Ui/3Wij) 

= -ei{(V i - Di)Vi( 1 - Vi)} V'i ( 9a ) 

where the straightforward differentiation has produced the result. 

The delta learning formula is the input energy change: - (QE/3Vi)(dVj/dUi)) = - (3E/3Ui) s 8j with 
respect to the top layer input: Ui=£j Wjj y'j + ©i in terms of the upward synaptic links Wjj. Such an energy 
change at the top layer input is propagated downward to the the input energy change with respect to the hidden 
layer input: U’k = E m W’km V ,l m + 0’^ in terms of the downward synaptic links W’kj 

8 i = - 0 E/aui) = -£ k (a E/au'k) © u’k/aui) 

= E k S’k 2 mO U'k/av" m ) 0V" m /aVi) (dVi/dUi) 

= (dVi/dUi)Zk8'kW’ki (9b) 

where the approximation equality sign = is due to the replacement of the unknown top layer input Vj with the 
known bottom layer input V" m . Thus, the delta learning rule remains to be approximately independent of 
neuronic time axes. 

5 j = Vj ( 1 - Vj ) Z k 8’k Wkj (9c) 

(3) Morphology Convergence for Hairy Neurons with Brittle-wired 
E3(Vi;Tij) 

In this section, we wish to formulate a set of neurodynamics equations which can settle itself into an 
appropriate network architecture, e.g. one layer of Hopfield, three layers of Rumelhart, or two layers of 
Grossberg. Neurophysiological experiments have recently shown that an active neuron can grow hairy 
neurofilaments, denoted as Tij, in competing for nutritions and networking partnership against other neurons, and 
has been called a hairy neuron modeKSzu 1989). The distinction between input synaptic weight Wjk from the 
output axonic neurofilament Tjj is necessary because of the recent neurophysiological experiments: (1) the use-it 
or lose-it synaptic pruning in one eye jack of a new born kitten, and (2) the actin protein generating the growth of 
neurofilaments. These neurofilament hair lines are competing for food and partnership. The winner grows fatter, 
while the loser shrinks thinner. The active growth of neurofilament Tij reaches out and touches other neuron, 
and becomes eventually matured and retracts itself in forming a physical gap, the synaptic junction Wjk, for 
better resistive control of the ion diffusion potential without the initial direct contact. In order to take into 
account the possibility of the pruning of synapses Wik (1), and the active growth of neurofilaments Tjj (2), the 
synaptic weights Wik at the ith neuronic dendrite tree and kth junctions are assumed to be dormant variables, 
while the neurofilaments Tjj located at the ith axonic output can grow into the jth neuron with the active tread- 
mill microtube assembly mechanism. Thus, we have extended the classical McCulloch-Pitts neuron model, Eq. 
(2), to include one more degree of freedom, such as the synaptic transfer function 

Tij=f(Wij) (10a) 

between the axonic filaments Tjj (protein actin-driven for dynamic growing/pruning) and the dendrite synapses 
W ij ( positive ion-driven firing rates). The biological survival principle, use it or lose it, can be applied to the 
neuron level to explain the observed fact of a reduced synaptic gap density by a pruning mechanism in the one eye 
jack experiment on a new bom kitten. In this experiment, a patch was place over the eye of a new bom kitten. The 
post-natal development of its brain had no optical inputs and the optical processing neural networks died off 
leaving the kitten normal eye function blind. It will take a life long training to regain the binocular vision. The 
synaptic transfer function Eq. (7a) becomes, in the new born or high gain limit, a binary step function of the 
threshold b and the step size a. 

f(x) = a step(x-b) (10b) 
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in the first pass a blurred template which had a correct statistics of image pieces through the straightforward 
pointing-and-tracking summation of many frames (about 16 distorted fields) according to the centroid of the 
whole frame (Szu & Blodgett 1982) (c.f Distorted Fields, Object, Long Term Average, Centroid Correction). This 
effect had demonstrated the need of a smart sensor concept such as the eye which can see a weak star during an 
"instance of good seeing" (Szu et al. 1980) through the turbulent sky. On the contrary, the undiscriminating and 
dumb telescope camera can only produce a blurred picture of the weak star in the over exposed picture by the 
whole frame summation based on the straightforward pointing-and-tracking gimbal without any adaptive 
phase for turbulence medium phase correction mechanism. 

Recently, a sequence of distorted imagery that consists of a training set of 15 samples of hand-written 
characters (each has 4 by 4 pixels, only trained to recognize 3 classes) has demonstrated the ability of 
generalization : recognize a new class of letter (Szu&Scheff 1989). This was done by means of critical feature 
extraction using the "mini-max concept" to discover by itself a new class of 5 more hand-written characters by 
analyzing the "intra-interclass clustering property" on the self-constructed feature space (c.f. Fig. 6 for 20 
samples of 4 classes). This example used a table top computer, because the Gram-Schmidt orthogonal feature 
extraction was based on the associative memory employing the Fixed-Point Cycle Two Theorem (Szu, Scheff 
1989). Such a procedure of parallel Gram-Schmidt constrained orthogonalization could be exceedingly usefully 
for a covert communication constrained by call signs and known scrambling instruction, because feature extraction 
by means of the straightforward projection is not permitted to obliterate critical portion of the signal. However, 
any practical construction of large set of orthogonal feature vectors could be subject to a realtime processing 
bottleneck. In this paper, the Fast Simulated Annealing (FSA) technique is adopted to alleviate the bottleneck 
problem. 

Image processing by annealing techniques have been attempted (Geman & Geman,1984) (Smith et al. 
1983) mainly for noise/distortion reduction. Neural networks have been recently applied to pattern recognition 
by Kohonen, Fukushima, Grossberg, Hopfield, etc.. White noise annealing and neural networks are combined 
through the Boltzmann Machine (Hinton, Sejnowski, Ackley, 1984) of which colored noise variant has been 
referred to as Cauchy Machine (Szu 1987) (Scheff &Szu 1987) (Takefuji & Szu 1989) 

SPATIO-TEMPORAL IMAGERIES 

A useful clutter rejection hypothesis is that man-made vehicles are designed to minimize the 
hydrodynamic drag via streamlined shapes and wheels while the natural environment of tree trunks is mainly 
vertical against the gravity (unpublished work of J. Landa, H.Szu). Thus, a sequence of imagery of land vehicles 
passing by bushes is considered. Fig. 7 (a). When a land vehicle moves by a tree, the partial occlusion of the 
vehicle by the tree trunk can be easily overcome by a properly pointing tracking, zooming, imaging on the moving 
vehicle. The image sequence can be averaged and threshold to get rid of the relative motion between the tree and 
the vehicle. Fig. 7 (b), together with the 9 by 9 scanning Peano curve. The centroid pointing and tracking of the 
vehicle is assumed to produce the averaged gray-scaled image < I c (x,y) > 

< I c (x,y) > = Ej I j(x+xc,y+y c )/ frames (13) 

where (x c , y c ) is a vehicle local centroid coordinate. After a certain threshold, the obscuring effect of the tree 
and bush will be minimized. Fig. 7 (describe the templates) 

Lc<x,y) = Threshold( < I c (x,y) > ) (14) 

Let the critical feature of the template class-c be denoted as f c (x,y). Then, the performance criterion is the 
minimum distance between the template of the c-class=l,2 together with the direction cosine in the numerator, 
and the maximum difference between feature vectors in the denominator. Thus, the mini-max filter energy is 

E(f c )= a I c * c' 1 < f c • f c' >) + b I c=l,2,... I f c * Ic 1 2 c * c' d 1 1 *c * f c’ 1 2 (15) 

where the coefficient of the direction cosine via the inner product < 1 > may be heavily weighted, e.g. by setting a 
= 10 (relative to b = 1, c=l, and d=10). The change of energy is defined as A E = E nevv - E 0 id. 

CAUCHY MACHINE 

The image space is 2-D; but the search space can be 1-D, provided that space-filling scanning technique 
is adopted here for mapping 2-D imagery space to 1-D search space and yet preserving the local neighborhood 
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Figure 6. Hand-written character recognition by orthogonal feature extraction using constrained Gram-Schmidt 
orthogonalization (GSO) procedure. 
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Figure 8. Cauchy simulated annealing search for the mini-max global minimum energy. Three segments of the 
ordinate (top segment: searching 9x9 states, middle segment: accepted 9x9 states, and bottom segment: the energy 
of the visited state) are plotted with respect to the abscissa of 2000 time points in three minutes CPU time on a 
Macintosh II. 
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Appendix A : Fast Simulated Annealing Algorithm (TRUE_BASIC Version) 


DATA 4,5,8,94 1 ,14,1 5,1 6,1 7,38,41,44,46,47,50,51 ,52,53,56,5758,59,67,69,70,71 ,72,78,79 
DATA 4,5,8,9,12,13,14,15,16,17,30,31,37,42,43,46,47,50,51,52,53,56,57,58,59,62,63,69,70 


DIM fl(81 ),f2(81 ),avel (81),ave2(81 ),ftl (81 ),ft2(81 ) 
MAT ave2 =0 
FOR rv=l to 29 
READk 
LET avel(k)=l 
NEXT n 

FOR m = 30 to 58 
READ] 

LET ave2(J)=l 
NEXT m 
RANDOM 
FOR t=l to tmax 
LET temp=To/(l +t) 

LET theta=(md-.5)*Pi 
LET dx=int(temp*tan(theta)) 

LET xnew=mod(x+dx,82) 

IF xnew=0 then LET xnew=81 
IF f2(xnew)=0 THEN 
LET ft2(xnew)=ave2(xnew) 

LET ftl(xnew)=0 
ELSE 

LET ft2(xnew)=0 

LET ftl(xnew)=avel(xnew) 

END IF 

LET enew= 0 

LET denominator^ 

LET efl=0 


!input 81 Peano-scanning pixel# 
!1= black feature Eq. (13) 

! True_Basic Matrix Operation 
! read an object into avel, namely II, Eq. (13) 


! read another object into ave2, namely 12, Eq. (13) 


! random number md generated 10,11 
! after initialize the display 
! Fast Simulated Annealing cooling schedule 
! uniform theta using the radian angle option 
! new pixel by T tan(theta), Eq. (15) 

! module for 81 scan pixels 


LET ef2=0 
FOR n=l to 81 

LET efl=efl+(ftl(n)-avel(n))*(ftl(n)-avel(n)) 

LET ef2=cf2+(ft2(n)-ave2(n))*(ft2(n)-ave2(n)) 

LET denominator=denominator+(ftl(n)-ft2(n))*(ftl(n)-ft2(n)) 
LET cncw = enew + ftl(n)*ft2(n) 


NEXT n 

LET enew= a*cnew + b*efl + c*ef2 + (d /denominator) ! constants are typed into the code at run time 

IF enew<eold then 
MAT f2=ft2 
MAT fl=ftl 


LET eold=enew 
LET x=xnew 
END IF 

IF enew>=eold then 

IF (rnd*0.5)<(l /(l+exp((enew-eold)/temp))) then ihill climbing Eq. (16) 

MAT f2=ft2 
MAT fl =ftl 
LET eold=enew 
LET x=xnew 


END IF 
END IF 

PLOT POINTS :t,xnew+200 
PLOT POINTS :t,x+100 
PLOT POINTS :t,eold/2 
NEXT t 
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